Welcome to the ATFS (Alliance for Tropical Research Science) data harmonization app!
The app is a tool meant to be used by 2 or more networks that are planing on combining their data for a common analysis.
The app relies on “Profiles” that indicate how the data is stored in the file(s) provided: names of columns storing the DBH, the census ID, the tree tag, units of measurements etc…
A profile is a .rds file that is downloaded via the app once all the
information about the data has been provided in the Headers and Units tab of the app.
One same profile can be uploaded as “input profile” in the Headers and Units tab, to speed up the
process once your network’s data has been profiled, and/or as “output
profile” in the Output format
tab, to transform other networks’ data into that profile.
Some networks have their profile stored within the app.
The app only accepts CSV files.
It performs best if all the information that you want to share is collated into one analytical file, so we recommend that you append your species and plot information to your measurement information beforehand, and upload that one bigger file into the app.
That said, you can decide to utilize the app to do exactly that. There is no limit to the number of files you can upload but they all need to connect to each other in one way or another, so that by a stacking and/or merging them, it is possible to collate them down to one file. We will get to this in more detail in a moment.
The app also relies on tidy
data, which means that every column is a variable, every row is an
observation and every cell is a single value. For example, a data set
with multiple column for the DBH measurement (e.g. DBH_2015, DBH_2020
etc…) is not a tidy data set. Instead, there should be a column for the
variable year (which, in our example, will take a value of
2015 or 2020), and a column for DBH. If your data is not in
a tidy format, the Tidy table tab
will help you reshape your data.
The app relies on functions that are maintained in a GitHub R package located here: https://github.com/Alliance-for-Tropical-Forest-Science/DataHarmonization.
We recommend to run the app on your local machine (via R and RStudio) if one of the following cases apply to you:
To open the app in R, you will need to install the DataHarmonization R package and launch Shiny with the following lines of code.
# install the R package
devtools::install_github("Alliance-for-Tropical-Forest-Science/DataHarmonization", build_vignettes = TRUE)
Downloading GitHub repo Alliance-for-Tropical-Forest-Science/DataHarmonization@HEAD
Note that you may need to install devtools package first
and that installing the DataHarmonization R package may ask you to
update a list packages.
You’ll want to re-install the package every once in a while, to get the latest version of the app.
If you don’t have R and RStudio and if your data is not too big, you can choose to run the online version of the app by clicking on this link. Note that online version may be lagging behind the GitHub version.
Once the app is launched you can start interacting with it.
There are multiple tabs to go through. Some tabs will be skipped automatically if they don’t apply to your situation and you may skip others if you don’t need/want them.
When you land on a tab, always advance with an action button (even if skipping) so your inputs are taken into account. You may use the navigation panel to return to a previous tab but remember to click on an action button to save your updated entries.
This tab starts with information that we already covered in the intro. The checklist is only a guideline to help you getting ready, and you don’t actually need to check the boxes to move on.
The numbered tasks are the elements that you do need to complete to be able to move forward.
Indicate how many tables you wish to upload
Indicate the finest level of measurement in your data:
Again, even if you are uploading plot level information but have a stem level data, you should upload that file along and indicate that your level of measurement is “Stem”.
Upload you tables. You’ll have as many upload boxes as you indicated needing in step 1. For each of them:
Browse... and navigate to the csv file you
want to upload.Click on SUBMIT to proceed to the next step.
If you uploaded more than one table, you will be prompted to the
Stack tables tab, but this tab will be skipped if you only
uploaded one table.
You will need to stack 2 or more tables if you are collecting the same information in multiple files. This can be the case if, for example, you are keeping your measurements from different plots in different files. Or you are keeping one file per census.
If you don’t need to stack tables, click on SKIP THIS STEP.
It is important that the files you are stacking have the same set of columns.
Select all the tables that need to be stacked
Click on STACK TABLES
Double check your newly created table looks ok
Click on GO TO MERGE to proceed to the next step. (Note: if you are down to one table at this stage the button’s label will change, so click SKIP MERGING SINCE ALL YOUR DATA IS NOW STACKED).
If you uploaded more than one table and not all of them were stacked,
you will be prompted to the Merge tables tab, but this tab
will be skipped if you only uploaded one table, or if all your tables
were stacked.
At the end of this stage you have to be down to one table.
You need to use merging if, e.g., your species or your plot information is stored in a different table than your measurement table, and there is at least one “key” column that you can use to connect the tables together.
In Merge this table, select the main measurement table (the one onto which you want to merge extra information into, from other tables). Note that this may be your now stacked table.
In And this table, Select the table that you want to bring information from.
Click on both blue arrows.
In the two dropdown menus about the ‘KEY column(s)’, select the column(s) that allow to connect the to tables together.Select all columns that are common between tables, otherwise columns will be repeated in the output, with extension ‘.y’ in the name of the second table.
click on ‘MERGE TABLES’.
If you are still not down to one table, another box will appear. Repeat 1-5 with the remaining tables.
click on ‘GO TO TIDY’.
At this stage, we want to make sure your data has one row per observation and one variable per column.
If you collected the same type of information in several columns (e.g. you added a column each time you visited a tree, or for each stem of the tree etc…), you need to “tidy” your table (also called wide-to-long reshaping).
In the top-most box, use the radio-buttons to indicate the reason you added new columns for a new observation.
The next set of boxes are pre-filled with our best guesses at the
columns that may contain the same variable (columns that have similar
names like dbh1, dbh2, or year1,
year2…). Our guess may be terrible. Your role is to:
Indicate the name of the new column that you wish your variable
to be called (e.g. dbh) in the text box. Note that this
should start by a letter and have no space.
Select all the columns of your data that represent the variable
indicated in step a. (e.g dbh1 and dbh2) using
the drop-down menu.
Tick the little tick-box on the upper-left corner of the box, to indicated that you do want to take into account what you selected.
Repeat a-c for the next variable(s), e.g. you may need a box to
indicate year in the text box and year1 and
year2 in the drop-down menu. Don’t forget to tick the
tick-box for those variables too.
Click on ‘TIDY’
Click on ‘GO TO HEADERS’
Here, we want to know in what column some key information is stored.
If…
click on ‘APPLY CHANGES’ and read any warnings that may popup, adjust your entries if possible (it is okay to ignore warnings) and re-apply your changes. Save (or update if needed) your profile (.rds file)
Check your new formated data looks ok. The headers and units are now following ATFS’s standard. You can see what those are by clicking the little button.
click on ‘NEXT’
You’ll be prompted to this tab if you indicated column(s) for ‘tree codes’ in the Tree Measurement section of the previous tab.
The table shows the list of codes that are available in the column(s) you indicated. If you intend to translate these codes to match the ones of another profile (which you will be able to do at a later step), or vice versa, you need to fill this table out.
Once you are done with the table you can update your profile (by downloading and overwriting your .rds file), so it will be faster next time. If you have already saved your profile after this step, and used it to fill out the previous step, you can click on the “Use your profile” button to automatically fill the table.
There is a list of predefined definitions, which, if used by you and your collaborator, will help automatically translate your codes. But if you can’t find a definition that matches yours, just type your own.
Corrections may be applied to botanical names, life status and/or diameters.
If you are planing on using corrections, we recommend that you coordinate with your collaborator(s), because it may make more sense to first skip them and apply them in a new app session in which the collated data set is uploaded, to ensure uniformity of corrections across data sets.
If you select “Yes” to any of the corrections, you will be able to change the default parameters. We recommend to keep most parameters to their default values.
The functions will add new columns to your data (indicated with
_DataHarmonizationCor suffix) and won’t alter your original
columns.
Note that some of these function are quite slow and it can take a few minutes for the output to show up.
You can visualize the correction applied in the tabs below the correction boxes.
This is where the magic happens.
Select or uploaded an output profile (.rds file) and click on Apply Profile. This will transform your data’s units and column names into the output profile’s headers and units. If you and your collaborator have a code table, you will be prompted to the code translation table. If not, you can check that the output data looks good and move on to ‘the download tab’.
Note: When you upload a profile, there is a chance that it is
“obsolete” if the profile was created on an earlier version of the app.
If that is the case, you’ll be notified by a pop up window and the items
missing in the profile will be listed. To be able to use that profile,
you will need an updated version of it (see How to update a profile in the Troubleshooting section
below.)
This section of the tab will only appear if you had to fill in the
code table in the Codes tab and the
output profile you selected or uploaded also has a code table. You can
now look through both sets of codes and indicate the ones that are
equivalent by checking the corresponding radio button(s).
If you hover over the column names, you will see the definitions of the output profile.
Click on “See definition” to double check that the mapping of codes is what you intended.
When you are happy, apply the mapping and you will see columns added to your table. They will have the column names that the output profile expects and will be filled with the output codes, based on the codes in your column(s) and the mapping you indicated.
Clicking on “save all” generates a zip file with all the files you should need. If you have tree codes and applied a tree code mapping, the zip will include a CSV file with the translation table (which you can upload in the code translation step to speed up the process next time.)
This a simplified version of this tutorial.
If your output profile (.rds file) is obsolete there are two options, which should be done by the person who created the profile in the first place:
For R savvy people: open the .rds file in R. It is a
list. Add the missing element(s) that were listed in the
pop up window that said that the profile was obsolete by giving them the
value that applies to the output profile. For example, if “Cluster” was
missing you can do:
profile <- readRDS("[YOUR PATH]/[YOUR PROFILENAME].rds")
profile$Cluster <- "none"
saveRDS(profile, "[YOUR PATH]/[YOUR PROFILENAME].rds")For non-R savvy people: go through the app again. It should go fast as you will be able to upload your obsolete profile as an input. Make sure to save and overwrite your profile. If the elements missing did not apply to you, they will now exist with value “none” in your profile.